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Abstract 

The fc-center problem is a canonical and long-studied facility location and clustering problem with 
many applications in both its symmetric and asymmetric forms. Both versions of the problem have 
tight approximation factors on worst case instances: a 2-approximation for symmetric fc-center and an 
0(log*(fc))-approximation for the asymmetric version. Therefore to improve on these ratios, one must 
go beyond the worst case. 

In this work, we take this approach and provide strong positive results both for the asymmetric 
and symmetric fc-center problems under a very natural input stability (promise) condition called a- 
perturbation resilience HD, which states that the optimal solution does not change under any o-f actor 
perturbation to the input distances. We show that by assuming 2-perturbation resilience, the exact solu¬ 
tion for the asymmetric fc-center problem can be found in polynomial time. To our knowledge, this is the 
first problem that is hard to approximate to any constant factor in the worst case, yet can be optimally 
solved in polynomial time under perturbation resilience for a constant value of a. Furthermore, we prove 
our result is tight by showing symmetric fc-center under (2 — e) -perturbation resilience is hard unless 
NP = RP. This is the first tight result for any problem under perturbation resilience, i.e., this is the first 
time the exact value of a for which the problem switches from being NP-hard to efficiently computable 
has been found. 

Our results illustrate a surprising relationship between symmetric and asymmetric fc-center instances 
under perturbation resilience. Unlike approximation ratio, for which symmetric fc-center is easily solved 
to a factor of 2 but asymmetric fc-center cannot be approximated to any constant factor, both symmetric 
and asymmetric fc-center can be solved optimally under resilience to 2-perturbations. 
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1 Introduction 


Overview: Traditionally, the theory of algorithms has focused on the analysis of worst-case instances. While 
this approach has led to many elegant algorithms and strong lower bounds, it tends to be overly pessimistic 
of an algorithm’s performance on the most typical instances of a problem. A recent line of work in the 
algorithms community, the so called beyond worst case analysis of algorithms, considers the question of 
designing algorithms for instances that satisfy some natural structural properties and has given rise to strong 
positive results (3] 4 0 20, 23: [24; [28]]. One of the most appealing properties that has been proposed in 
this space is the stability of the solution to small changes in the input. Bilu and Linial ifTTI formalized this 
property in the notion of a-perturbation resilience, which states that the optimal solution does not change 
under any o-factor perturbation to the input distances. 

A large body of work has sought to exploit the power of perturbation resilience in problems such as 
center-based clustering mmrnm, finding Nash equilibria in game theoretic problems |[S], and the trav¬ 
eling salesman problem l26ll . These works are focused on providing positive results for exactly solving the 
corresponding optimization problem under perturbation resilient instances, for example, l+\/2-perturbation 
resilience for center based clustering, and 0(-v/log n log log n) -perturbation resilience for max-cut. In this 
work we continue this line of work and provide a tight result for the canonical and long-studied /.‘-center 
clustering problem, thereby completely quantifying the power of perturbation resilience for this problem. 
We show that a = 2 is the moment where the problem switches from NP-hard to efficiently computable - 
specifically, we show that by assuming 2-perturbation resilience, the exact solution for the /c-center problem 
can be found in polynomial time; we also show that A;-center under (2 — e) -perturbation resilience cannot be 
solved in polynomial time unless NP = RP. Our results apply to both symmetric and asymmetric k-center, 
illustrating a surprising relationship between symmetric and asymmetric /.--center instances under perturba¬ 
tion resilience. Unlike approximation ratio, for which symmetric /c-center is easily solved to a factor of 2 
but asymmetric /c-center cannot be approximated to any constant factor, both symmetric and asymmetric 
/c-center can be solved optimally under resilience to 2-perturbations. Overall, this is the first tight result 
quantifying the power of perturbation resilience for a canonical combinatorial optimization problem. 

Our Results: The /c-center problem is a canonical and long-studied clustering problem with many applica¬ 
tions to facility location, data clustering, image classification, and information retrieval I TU fhX 14.15] lT 
[30l . For example, it can be used to solve the problem of placing k fire stations spaced throughout a city to 
minimize the maximum time for a fire truck to reach any location, given the pairwise travel times between 
important locations in the city. In the symmetric /c-center problem the distances are assumed to be symmet¬ 
ric, while in the asymmetric k- center problem they are not; however in both cases they satisfy the triangle 
inequality. Formally, given a set of n points S, a distance function d : S x S —> R + satisfying the triangle 
inequality (and symmetry in the symmetric case), and an integer k, our goal is to find k centers {ci,..., eg.} 
to minimize max pe s min, d(ci,p). 

Both forms of /c-center admit tight approximation bounds. For symmetric k- center, several 2- approx¬ 
imation algorithms have been found starting in the mid 1980s (e.g., EjOED)- This is the best possible 
approximation factor by a simple reduction from set cover. On the other hand, the asymmetric /c-center 
problem is a prototypical problem where the best known approximation is superconstant and is matched by 
a lower bound. For the asymmetric /c-center problem, an Of log* (n))-approximation algorithm was found by 
Vishwanathan If30l . and later improved to Of log* (/,:)) by Archer (2. This approximation ratio was shown 
to be asymptotically tight by the work of Chuzhoy et al. fl5l . which built upon a sequence of papers estab¬ 
lishing the hardness of approximating (-/-uniform hypergraph covering (culminating in |[T6||). 

In this work we consider both symmetric and asymmetric /c-center under perturbation resilience and 
give tight results for both forms. In addition, we consider more robust and weaker variants of perturbation 
resilience, and give strong results for these problems as well. A summary of our results and techniques used 
to achieve them are as follows: 
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1. Efficient algorithm for symmetric and asymmetric k-center under 2-perturbation resilience. This directly 
improves over the result of Balcan and Liang Q for symmetric /-center under 1 + \/2-perturbation re¬ 
silience. We show that any a-approximation algorithm returns the optimal solution for an a-perturbation 
resilient instance, thus showing there exists an optimal algorithm for symmetric /c-center under 2-perturbation 
resilience. For the asymmetric result, we first construct a “symmetrized set” by only considering points 
that demonstrate a rough symmetry. Then we prove strong structural results about the symmetrized set 
which motivates a novel algorithm for detecting clusters locally. 

2. Hardness of symmetric k-center under (2 — e)-perturbation resilience. This shows that our perturbation- 
resilience results are tight for both symmetric and asymmetric /c-center. For this hardness result, we use 
a reduction from a variant of perfect dominating set. To show that this variant is itself hard, we construct 
a chain of parsimonious reductions (reductions which conserve the number of solutions) starting from 
3-dimensional matching to perfect dominating set. 

3. Efficient algorithms for symmetric and asymmetric k-center under (3, e )-perturbation resilience. A clus¬ 
tering instance satisfies (a, e)-perturbation resilience if < en points switch clusters under any a-perturbation. 
We assume the optimal clusters are of size > 2 en (the problem is NP-hard without this assumption). We 
show that if any single point p is close to an optimal cluster other than its own, then k — 1 centers achieve 
the optimal score under a carefully constructed 3-perturbation. Any other point we add to the set of cen¬ 
ters must create a clustering that is e-close to OVT , and we show all of these sets cannot simultaneously 
be consistent with one another, thus causing a contradiction. A key concept in our analysis is defining the 
notion of a cluster-capturing center, which allows us to reason about which points can capture a cluster 
when its center is removed. 

4. Efficient algorithm for any center-based clustering objective under weak center proximity. Weak center 
proximity asks that each point be closer to its own center than to any point from any other cluster, but 
note that it allows a cluster center to be closer to points from different clusters than to its own. Thus 
it is not at all obvious whether efficient optimal clustering is possible in such a setting. We present a 
novel linkage-based algorithm that is able to do so. It works by iteratively running single linkage as a 
subroutine until all clusters are balanced, and then removing all but the very last link. 

The novelty of our results are manifold. First, our work is the first to provide a tight perturbation re¬ 
silience result, thereby, painting the complete picture for /c-center under perturbation resilience. Second, this 
is the first result where a problem is not approximable to any constant in the worst-case, but can be optimally 
solved under resilience to small constant perturbations. Third, we are the first to consider an asymmetric 
problem under stability. Our results here illustrate a stark contrast between worst-case analysis and analysis 
of algorithms under stability. Unlike approximation ratio, for which symmetric /c-center is easily solved to 
a factor of 2 but asymmetric /c-center cannot be approximated to any constant factor, both symmetric and 
asymmetric /c-center can be solved optimally under the same constant level of resilience. 

Perturbation resilience has a natural interpretation for both symmetirc and asymmetric /c-center: it can 
be viewed as a stability condition in the presence of uncertainties involved in measurements. For example, 
small fluctuations in the travel time between a fire station and locations in the city, which are caused by 
different levels of traffic at different times of day, should not drastically affect the optimal placement of fire 
stations. Furthermore, perturbation resilience can be viewed as a condition on an instance under which the 
optimal solution satisfies a form of privacy. For instance, if the actions of no individuals (such as how they 
drive to work, or the amount of network traffic they are using in a network application) can affect the overall 
state of the problem drastically then the individual’s actions cannot be detected by looking at the optimal 
solution. 
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1.1 Related Work 


Perturbation Resilience There has been a recent substantial interest on providing algorithms that circum¬ 
vent worst case hardness results on stable, realistic instances. Bilu and Linial defined Perturbation Re¬ 
silience ifTTl and showed algorithms that found the optimal solutions for (1 + e)-perturbation resilient in¬ 
stances of metric and dense Max-Cut, and Q(y/n )-perturbation resilient instances for Max-Cut in general. 
The latter bound was improved by Markarychev et al. (25], who showed the standard SDP relaxation is 
integral for 0 ,(\/log n log log n)-perturbation resilient instances. This result is similar in flavor to our Theo¬ 
rems [4] and |TT] in that algorithms used in practice give the optimal solution assuming perturbation resilience. 
They show that finding an algorithm for o(\/log n)-perturbation resilience would improve the best known 
approximation for Sparsest Cut, but such an algorithm must either output an optimal solution, or certify the 
instance is not stable (our Theorem [TO] applies to any algorithm that is required to produce the optimal solu¬ 
tion). Finally, they showed an algorithm to find the optimal solution for 4-perturbation resilient instances of 
Minimum Multiway Cut. Awasthi et al. (4] studied o-perturbation resilience under center-based clustering 
objectives (which includes A-median, A:-means, and /c-center clustering, as well as other objectives), showing 
an algorithm to optimally solve 3-perturbation resilient instances. Balcan and Liang @ improved this re¬ 
sult, finding an algorithm to optimally solve center-based objectives under (1 + \/2)-perturbation resilience. 
They also studied (a, e)-perturbation resilience, showing an algorithm that finds near-optimal solutions for 
k- median under (4, e)-perturbation resilience, assuming a lower bound on the size of the optimal clusters. 
Recent work has applied perturbation resilience to other settings to obtain better than worst-case approxima¬ 
tion guarantees, including finding Nash equilibria in game theoretic problems [8] and the travelling salesman 
problem (26]. 

Now we describe results for a related stability assumption called approximation stability, which is strictly 
stronger than perturbation resilience. 

Approximation Stability Balcan et al. (6] defined (a, e)-approximation stability, a related, stronger notion 
than perturbation resilience, and showed algorithms that returned near-optimal solutions for A’-mcdian and 
A:-means when a > 1, and min-sum when a > 2. Gupta et al. (19] gave an alternative algorithm for finding 
near-optimal solutions for A:-median under approximation stability as part of their work on finding structure 
in triangle-dense graphs. The result for the min-sum objective was later improved by Balcan and Braver- 
man 0 to a > 1, while also doing better when there is no lower bound on the size of the optimal clusters. 
Voevodski et al. lf3T] gave an algorithm for optimally clustering protein sequences using the min-sum ob¬ 
jective under approximation stability, which empirically compares favorably to well-established clustering 
algorithms. 

Other Stability Assumptions There has also been work on other types of stability assumptions for cluster¬ 
ing. Ovstrosky et al. (27l studied a separation condition in which the A:-means cost of a clustering instance 
is much lower than the (k — l)-means cost. They showed how to efficiently cluster these instances using 
the Lloyd heuristic with a random farthest traversal seeding. Kumar and Kannan (231 studied a condition 
in which the projection of any point onto the line between its cluster center to any other cluster center is a 
large additive factor closer to its own center than the other center. They showed that using this assumption 
one can efficiently cluster the data. These results were later improved along several axes by Awasthi and 
Sheffet 0. Many other works have shown strong positive results on instances satisfying beyond worst case 
natural notions of stability on problems ranging from clustering to data privacy to social networks to topic 
modeling |[2l |3] (H] (20] (23] EH [28]|. 

2 Preliminaries 

We define a clustering instance as (S', d), where S is a set of n points and d : S x S —> M>o is a distance 
function. In the A:-center problem, the goal is to find a set of points p = {p\,... ,pk} C S called centers such 
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that the maximum distance from any point to its closest center is minimized. More formally, in the /r-ccntcr 
problem, given a Voronoi partition P = {Pi ,..., F/ ; } induced by a set of centers p = {p\...., pi -} (where 
for all 1 < i < k, pi £ Pi), we define the cost of V by <1>('P) = max, e r/..] max,, e p ( d(pi, v). We indicate by 
OPT the clustering {C\,... ,Ck} with minimum cost, we denote the optimal centers {ci,..., c/,.}, and we 
denote the optimal cost &(OPT) by r*, the maximum cluster radius. 

We study the /.•-center clustering of instance (.S', d ) under two types of distance functions, symmetric 
and asymmetric. A symmetric distance function is a metric. An asymmetric distance function satisfies all 
the properties of a metric space, except for symmetry. That is, it may be the case that for some p. q E S, 
d(p,q) d(q,p). Note that the /c-center objective function for asymmetric instances is the same as the 
symmetric case, the maximum distance from the center to the points, where the order now matters. 

We consider perturbation resilience, a notion of stability introduced by Bilu & Linial ifTTl . Perturbation 
resilience implies that the optimal clustering does not change under small perturbations of the distance 
measure. Formally, d' is called an o-perturbation of distance function d, if for all p. q E S, d(p, q) < 
d'{p, q ) < ad(p, q). Q Perturbation resilience is defined formally as follows. 

Definition 1. A clustering instance ( S, d) satisfies a-pertu rbation resilience for k-center, if for any a- 
perturbation d' of d, the optimal k-center clustering under d' is unique and equal to OPT. 

Note that the optimal centers may change, but the Voronoi partition C\,... ,Cj- induced by them must 
stay the same. We do not assume that d! is a metric. 0 We also consider a more robust variant of a- 
perturbation resilience, called (a, e)-perturbation resilience, that allows a small change in the optimal clus¬ 
tering when distances are perturbed. To this end, we say that two clusterings C and C are e-close, if only an e- 
fraction of the input points are clustered differently in the two clusterings, i.e., inin CT ffi=\ I Ci \ | < en, 

where a is a permutation on [k]. Formally, 

Definition 2. A clustering instance (S, d) satisfies (a, e)-perturbation resilience for k-center, if for any a- 
perturbation d' of d, any optimal k-center clustering C under d' is e-close to OPT. 


We use e-far to denote two clusters which are not e-close. We also discuss the strictly stronger notion of 
approximation stability ||5), which requires any cr-approximation (not just a Voronoi partition) to be e-close 
to OPT. This is formally defined in Section 3.3 In Section [5} we define center-based objectives (9), a 
more general class of clustering functions which includes objective functions such as fc-center, /.'-median, 
and /;:-mcans. Throughout this work, we use B r (c) to denote a ball of radius r centered at point r. Also for 
a point p and a set D, d(p, D ) denotes the distance from p to the farthest point in D. 


3 2-perturbation resilience 

In this section, we provide efficient algorithms for finding OPT for symmetric and asymmetric instances 
of /.'-center under 2-perturbation resilience. Our result directly improves on the result of Balcan and Liang 
for symmetric /c-center under (1 + \/2)-pcrturbation resilience J9j]. We also show that it is NP-hard to re¬ 
cover OPT even in the symmetric /c-center instance under (2 — e)-approximation stability. As an immediate 
consequence, our results are tight for both perturbation resilience and approximation stability, for symmet¬ 
ric and asymmetric k- center instances. This is the first problem for which the exact value of perturbation 
resilience is found (a = 2), where the problem switches from efficiently computable to NP-hard. 

In the remainder of this section, first we show that any ce-approximation algorithm returns the optimal 
solution for a-perturbation resilient instances. An immediate consequence is an algorithm for symmetric 

1 WLOG, we only consider perturbations in which the distances increases because we can scale the distances to simulate de¬ 
creasing distances. 

2 This is well-justified, as the data may be gathered from heuristics or an average of measurements. 
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^-center under 2-perturbation resilience. Then we provide a novel algorithm for asymmetric A'-center under 
2-perturbation resilience. 

3.1 Approximation algorithms under perturbation resilience 

The following lemma allows us to reason about a specific type of a-perturbation we construct. This lemma 
will be important throughout the analysis in this section and in Section[4j 

Lemma 3. For all a > 1, given an a-perturbation d' of d with the following property: for all p, q, if 
d(p, q ) > r* then dfp, q) > ar*. Then the optimal cost under d' is ar*. 

Proof Clearly the optimal cost under d' cannot be greater than ar *, since d' is an a-perturbation. Suppose 
there exists a set of centers c\,..., c' k under d! that achieves a cost < ar*. Then for all i and all p £ C[, 
rf(c',p) < ar*. But then by assumption, d(c',p) < r*. This implies that c \...., c' k achieve an optimal cost 

< r* under d, which is a contradiction. □ 

The following theorem will imply that any a-approximation algorithm for A-ccntcr will return the opti¬ 
mal solution on clustering instances that are a-perturbation resilient. 

Theorem 4. Given a clustering instance ( S, d) satisfying a-perturbation resilience for asymmetric k-center. 
Given a set C ofk centers which is an a-approximation , i.e., Mp £ S, 3c £ C s.t. d(c,p) < ar*. Then the 
Voronoi partition induced by C is the optimal clustering. 

Proof. For a point p £ S, let c{p) := argmin cgC -d(c,p), the closest center in C to p. The idea is to construct 
an a-perturbation in which C is the optimal solution by increasing all distances except between p and c(p), 
for all p. Then the theorem will follow by using the definition of perturbation resilience. 

By assumption, Vp £ S, d(c{p). ; p) < ar*. Create a perturbation d! as follows. Increase all distances 
by a factor of a, except for all p £ S, set d'{c(p).p) = min (ad. ( c(p),p), ar*). Then no distances were 
increased by more than a factor of a. And since we had that d(c(p),p ) < ar*, no distances decrease either. 
Therefore, d' is an a-perturbation of d. By Lemma [3] the optimal cost for d! is ar*. Also, C achieves cost 

< ar* by construction, so C is an optimal set of centers under d'. Then by a-perturbation resilience, the 
Voronoi partition induced by C under d' is the optimal clustering. 

Finally, we show the Voronoi partition of C under d is the same as the Voronoi partition of C under d'. 
Given p £ S whose closest point in C is c{p) under d. then under d', all distances from p to C \ {c(p )} 
increased by exactly a, and d(p, c(p )) increased by < a. Therefore, the closest point in C to p under d' is 
still c(p). □ 

An immediate consequence is that we have exact algorithms for symmetric fc-center under 2-perturbation 
resilience, and asymmetric A-ccntcr under O (log* (A:)) - pc rt u rbati o n resilience. Now we show it is possible 
to substantially improve the latter result. 

3.2 Asymmetric k- center algorithm 

One of the challenges involved in dealing with asymmetric A-ccntcr instances is the fact that even though 
for all p £ Ci, d(ci,p ) < r*, dip, a) might be arbitrarily large. Such points for which d(p, cf) AV r* pose a 
challenge to the structure of the clusters, as they can be very close to points or even centers of other clusters. 
To deal with this challenge, we first define a set of “good” points, A, such that A = {p \ \/q,d(q,p ) < 
r* dip, q) < r* }. Intuitively speaking, these points behave similarly to a set of points with symmetric 
distances up to a distance r*. To explore this, we define a desirable property of A with respect to the optimal 
clustering. 
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B r .(p) 


Br-(Ci) 


Property 1 


Property 2 



Q 


(a) Properties of 2-perturbation resilience (b) Demonstrating the correctness of Algorithm[jJ 

Figure 1: Properties of a 2-perturbation resilient instance of aymmetric /c-center that are used for clustering. 


Definition 5. A is said to respect the structure of OVT if(l)ci G A for all i G [A:], and (2 )for all p G S\A, 
if A(p) := arg mm q£ a d(q, p) G Cj, then p G C\. 

For all i, define C[ = C r D A (which is in fact the optimal clustering of A, although we do not need to 
prove this). Definition [5] implies that if we can optimally cluster A , then we can optimally cluster the entire 
instance (formalized in Theorem [8]). Thus our goal is to show that A does indeed respect the structure of 
OVT, and to show how to return C \..... C' k . 

Intuitively, A is similar to a symmetric 2-perturbation resilient clustering instance. Flowever, some struc¬ 
ture is no longer there, for instance, a point p may be at distance < 2 r* from every point in a different cluster, 
which is not true for 2-perturbation resilient instances. This implies we cannot simply run a 2-approximation 
algorithm on the set A, as we did in the previous section. Flowever, we show that the remaining structural 
properties are sufficient to optimally cluster A. To this end, we define two properties and show how they 
lead to an algorithm that returns C[,, C' k , and help us prove that A respects the structure of OVT. 

The first of these properties requires each point to be closer to its center than any point in another cluster. 
That is, Property (1): For all p G C[ and q G C', i f j, dUy.p) < d(q,p). The second property requires 
that any point within distance r* of a cluster center belongs to that cluster. That is, Property (2): For all 
i T j an -d Q G Cj, d(q, cf) > r* (Figure^.fl 

Let us illustrate how these properties allow us to optimally cluster A. f] Consider a ball of radius r* 
around a center c-,, by Property 2, such a ball exactly captures C[. Furthermore, by Property 1, any point in 
this ball is closer to the center than to points outside of the ball. Is this true for a ball of radius r* around 
a general point pi Not necessarily. If this ball contains a point q G C' j from a different cluster, then q will 
be closer to a point outside the ball than to p (namely, Cj, which is guaranteed to be outside of the ball by 
Property 2). This allows us to determine that the center of such a ball must not be an optimal center. 

This structure motivates our Algorithm [T] for asymmetric /c-center under 2-perturbation resilience. At a 
high level, we start by constructing the set A (which can be done easily in polynomial time). Then we create 
the set of all balls of radius r* around all points in A (if r* is not known, we can use a guess-and-check 
wrapper). Next, we prune this set by throwing out any ball that contains a point farther from its center than 
to a point outside the ball. We also throw out any ball that is a subset of another one. Our claim is that the 
remaining balls are exactly C[,... ,C' k . Finally, we add the points in S \ A to their closest point in A. 

Lemma 6. Properties 1 & 2 hold for asymmetric k-center instances under 2-perturbation resilience. 

Proof sketch. For Property 2, assume that there exists a and q G Cj, i f j, such that d(q. a) < r*. We 
construct a 2-perturbation in which q becomes the center for C'j (similar to the previous paragraph). Increase 

3 Property (1) first appeared in the work of Awasthi et at. (4], for symmetric clustering instances. A weaker variation of Property 
(2) was introduced by Balcan and Liang (3, which showed that in 1 + \/2-perturbation resilient instances for any cluster Ci 
with radius rt, B ri (a) = Ci. Our Property (2) shows that this is true for a universal radius, r*, even for 2-perturbation resilient 
instances, and even for asymmetric instances. 

4 Other algorithms work, such as single linkage with dynamic programming at the end to find the minimum cost pruning of k 
clusters. However, our algorithm is able to recognize optimal clusters locally (without a complete view of the point set). 
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Algorithm 1 ASYMMETRIC fc-CENTER ALGORITHM UNDER 2-PR 
Input: Asymmetric fc-center instance (S. d), r* (or try all possible candidates). 

1. Build set A = {p \ \/q, d(q,p) < r* => d(p, q) < r*} 

2. Vc E A, construct G c = B r * (c) (the ball of radius r* around c). 

3. VG C , if zip E G c , q f G c s.t. d(q,p ) < d(c,p), then throw out G c . 

4. Vp, q s.t. G p C G q , throw out 

5. Up f A, add p to where q = arg min sey 4 d(s,p). 

Output: Output the sets G\,... ,Gk- 


all distances by a factor of 2, except for the distances from q to Ci, which we increase until they reach 2 r*. 
By Lemma[3] this 2-perturbation achieves a cost of 2 r*. However, q is distance 2 r* to C, , so it must replace 
Ci as an optimal center. Then q and c 3 are no longer in the same cluster, causing a contradiction. 

The first property was shown to hold for symmetric instances by Awasthi et al. and the same proof can 
be used for asymmetric instances. We include the proof in Appendix [Aj □ 

Lemma 7. A respects the structure of OVT. 


Proof sketch. First we show that c* E A for all i E [k]. Given a, Vp E Ci, d(ci,p) < r* by definition of 
OVT. Vg f Ci, by Property 2, d(q. cf) > r*. Therefore, for any point p E S, it cannot be the case that 

d(p, Ci) < r* and d(ci,p ) > r*. 

Now we show that for all p E S \ A, if A{p) E C, , then p E C',. Assume towards contradiction that 
zip E Ci\ A and q = A(p) E C 3 for some i f j. Similar to the proof of Lemma 22 we will construct 


a 2-perturbation d' in which q replaces c 3 as the center for C 3 . We do this by increasing all distances by a 
factor of 2 except for d(q, q'), Vg' E C 3 , which we increase by a factor of 2 up to 2 r*. But then, since all 
centers are in A, and q is the closest point in A to p, it follows that in d!, p switches clusters from C t to C 3 . 
This completes the proof. □ 

Theorem 8. A Igori thin [7] re turns the exact solution for asymmetric k-center under 2-perturbation resilience. 

Proof. First we must show that after stepjd] the remaining sets are exactly C \..... C' k = C\C\A ,..., T4n A 
We prove this in three steps: the sets G Ci correspond to C[, these sets are not thrown out in steps [5] and [4] 
and all other sets are thrown out in steps 3]and[4j Because of Lemma [22| we can use Properties 1 and 2. 

For all i, G Cj = Cf. From Lemma[23 all centers are in A, so G Ci will be created in step 2. For all p E Cj, 
d(ci,p ) < r*. For all q f C\, then by Property 2, d(q, cf) > r* (and since a, q E A, d(ci , q) > r* as well). 
For all i, G C/ is not thrown out in step [3j Given s E G c . and t f. G Ci . Then s E C[ and t E C' for j i. 
If d(t, s ) < d(ci, s ), then we get a contradiction from Property 1. For all non-centers p, G p is thrown out 
in step [5] or [d] From the previous paragraph, G C/ = Cf If G p C G Ci , then G p will be thrown out in step 
[4] (if G p = G Cj , it does not matter which set we keep, so WLOG say that we keep G Ci ). Then if G p is not 
thrown out in stcpjdj 3.s E G p fl Cf j f i. If s = c 3 , then dip. c 3 ) < r* and we get a contradiction from 
Property 2. So, we can assume s is a non-center (and that c 3 f G p ). But d(cj , s) < d(p, s ) from Property 1, 
and therefore G p will be thrown out in step [5] Thus, the remaining sets after step [5] are exactly C \..... Cf 
Finally, by Lemma 23 for each p E Ci \ A, A(p) E C t , so p will be added to G Ci . Therefore, the final 
output is C \,..., Cfc. Q 


3.3 Hardness for A -center under (2 — e) -approximation stability 

In this section, we consider approximation stability, introduced by Balcan et al. (61, which is strictly stronger 
than perturbation resilience. We show that if symmetric A:-center under (2—e)-approximation stability can be 
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solved in polynomial time, then NP = RP, even under the condition that the optimal clusters are all > T-. 
Because approximation stability is stronger than perturbation resilience, this result implies /.--center under 
(2 — e)-perturbation resilience is hard as well. Similarly, symmetric /c-center is a special case of asymmetric 
/c-center, so we get the same hardness results for asymmetric /c-center. This proves that Theorem[8]is tight. 

Approximation stability requires constant approximations to the optimal cost to differ from OPT by at 
most an e-fraction of the points. 

Definition 9. A clustering instance ( S , d) satisfies (a, e)-approximation stability for k-center, if for any 
partition C with objective value r' (not necessarily a Voronoi partition), if r' < ar*, then C is e-close to 

OPT. 

It is not hard to see that (a, e)-approximation stability implies (a, e)-perturbation resilience, as the op¬ 
timal clustering under any a-perturbation costs at most ar* under the original distance function, d. So, a 
violating instance of (a, e)-perturbation resilience induces a partition which costs < ar* and is e-far from 
OPT , and therefore is not (a, e)-approximation stable. 

Theorem 10. There is no polytime algorithm for finding the optimal k-center clustering under (2 — e)- 
approximation stability, even when assuming all optimal clusters are size > T, unless NP = RP. 

We show a reduction from a special case of Dominating Set which we call Unambiguous-Balanced- 
Perfect Dominating Set. A reduction from Perfect Dominating Set (Dominating Set with the additional 
constraint that for all dominating sets of size < k, each vertex is hit by exactly one dominator) to the 
problem of clustering under (2 — e)-center proximity was shown in ifTOl (a-center proximity is the property 
that for all p £ C,, and j f i, ad(ci,p) < d(cj,p), and it follows from a-perturbation resilience). Our 
contribution is to show that Perfect Dominating Set remains hard under two additional conditions. First, in 
the case of a YES instance, each dominator must hit at least ^ vertices (which translates to clusters having 
size at least T as well). Second, we are promised that there is at most one dominating set of size < k 
(which is required for establishing approximation stability for the resulting clustering instance). The details 
are provided in Appendix |B| 

4 Robust perturbation resilience 

In this section, we consider (a, e)-perturbation resilience. We show that under (3, e)-perturbation resilience, 
there is an algorithm that recovers OPT for symmetric /c-center, and an algorithm that returns a solution 
that is e-close to OPT for asymmetric /c-center. For both of these results, we assume a lower bound on 
the size of the optimal clusters, |Cj| > 2 en for all i £ [Zc], We show the lower bound on cluster sizes is 
necessary; in its absence, the problem becomes NP-hard for all values of a > 1 and e > 0. The theorems 
in this section require a careful reasoning about sets of centers under different perturbations that cannot all 
simultaneously be valid. 

4.1 Symmetric k- center 

We show that for any (3, e)-perturbation resilient /c-center instance such that \Cf\ > 2 en for all i £ [k\, 
OPT can be found by simply thresholding the input graph using distance r* and outputting the connected 
components. A nice feature of our result is that the Single Linkage algorithm, a fast algorithm widely used 
in practice, is sufficient to optimally cluster these instances. 

Theorem 11. Given a (3, e)-perturbation resilient k-center instance (S, d) where all optimal clusters are 
> max(2en, 3). Then the optimal clusters in OPT are exactly the connected components of the threshold 
graph G r * of the input distances. 


Proof idea. Since each optimal cluster center is distance r* from all points in its cluster, it suffices to show 
that any two points in different clusters are at least r* apart from each other. Assume on the contrary that 
there exist p £ Cj and q £ Cj, i / j, such that d(p, q ) < r*. First we find a set of k + 2 points and a 
3-perturbation d', such that every size k subset of the points are optimal centers under d'. Then we show 
how this leads to a contradiction under (3, e)-perturbation resilience. 

From our assumption, p is distance < 3r* from every point in Cj U Cj (by the triangle inequality). 
Under a 3-perturbation in which all distances are blown up by a factor of 3 except d(p, Cj U Cj), then 
replacing c % and Cj with p would still give us a set of k — 1 centers that achieve the optimal score. But, would 
this contradict (3, e)-perturbation resilience? Indeed, not! Perturbation resilience requires exactly k distinct 
centers. [^The key challenge is to pick a final “dummy” center to guarantee that the Voronoi partition is e-far 
from OVT . The dummy center might “accidentally” be the closest center for almost all points in Cj or Cj. 
Even worse, it might be the case that the new center sets off a chain reaction in which it becomes center to a 
cluster C x , and c x becomes center to Cj, which would also result in a partition that is not e-far from OVT. 

To deal with the chain reactions, we crucially introduce the notion of a cluster capturing center (CCC). 
c x is a CCC for C y , if for all but en points p £ C y , d(c x ,p) < r* and for all i ^ x,y, d(c x ,p) < d(ci,p). 
Intuitively, a CCC exists if and only if c x is a valid center for C y when c y is taken out of the set of optimal 
centers (i.e., a chain reaction will occur). We argue that if a CCC does not exist then every dummy center 
we pick must be close to either Ci or Cj, since there are no chain reactions. If there does exist a CCC c x for 
C y , then we cannot reason about what happens to the dummy centers under our d!. However, we can define 
a new d" by increasing all distances except d(c x , C y ), which allows us to take c y out of the set of optimal 
centers, and then any dummy center must be close to C x or C y . There are no chain reactions because we 
already know c x is the best center for C y among the original optimal centers. Thus, whether or not there 
exists a CCC, we can find k + 2 points close to the entire dataset by picking points from both Ci and Cj 
(resp. C x and C y ). 

Because of the assumption that all clusters are size > 2 en, for every 3-perturbation there must be a Injec¬ 
tion between clusters and centers, where the center is closest to the majority of points in the corresponding 
cluster. We show that all size k subsets of the k + 2 points cannot simultaneously admit bijections that are 
consistent with one another. 

Formal analysis. We start out with a simple implication from the assumption that Cj > 2en for all i. 

Fact 12. Given a clustering instance which is (a, e) -perturbation resilient for a > 1, and all optimal 
clusters have size > 2 en. Then for any a-perturbation, for any set of optimal centers cj,..., c(, for each 
optimal cluster Ci, there must be a unique c[ which is the center for more than half of the points in Ci under 
d'. 


Now we formally define a CCC. 

Definition 13. A center Ci is a first-order cluster-capturing center (CCC) for Cj if for all x fi j, for more 
than half of the points p £ Cj, d(ci,p) < d(c x ,p) and d(ci,p ) < r*. Ci is a second-order cluster-capturing 
center (CCC2) for Cj if there exists a ci such that for all x f j, l, for more than half of points p £ Cj, 
d(ci,p) < d(c x ,p) and d(ci,p ) < r* (see Figure [2c]). 

Each cluster Cj can have at most one CCC Cj because cy is closer than any other center to more than half 
of Cj. Every CCC is a CCC2, since the former is a stronger condition. However, it is possible for a cluster to 
have multiple CCC2’s. [] Wc needed to define CCC2 for the following reason. Assuming there exist p £ Ci 

3 This distinction is well-motivated; if for some application, the best fc-center solution is to put two centers at the same location, 
then we could achieve the exact same solution with k—1 centers. That implies we should have been running fc'-center for k' = k— 1 
instead of k. 

6 In fact, a cluster can have at most three CCC2’s, but this is not relevant to our analysis. 
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Figure 2: (a) c, is a CCC for Cj. c x is a CCC2 for Cj. (b) Case 1 of Lemma 
distances denoted with a black edge are length < r*. 
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there exists a CCC2 c T . All 


and q G Cj which are close, and we replace c, and c ;J with p in the set of centers. Maybe C, has a CCC, but 
it is Cj. This is not relevant to our analysis, since Cj was taken out of the set of centers. However, if we know 
that c x is a CCC2 (it is the best center for Ci, disregarding Cj), then we know that c x will be the best center 
for Ci after replacing c % and Cj with p. Now we use this definition to show that if two points from different 
clusters are close, then all points are close together in some sense. We include the proof in Appendix |D| 

Lemma 14. Given a clustering instance satisfying (3, e)-perturbation resilience and all optimal clusters 
are size > 2 en. Assume there are two points from different clusters which are < r* apart from each other. 
Then there exists a partition S x U S y of S such that for all p,q G S x , d(p. q) < 2 r* and similarly for all 
p,qe Sy, dfp. q) < 2 r* 

So far, we have shown that by just assuming two points from different clusters are close, then many 
points are very close together, in some sense. Now we will show that such an instance cannot be stable 
under (3, e)-perturbation resilience. 

One implication of the last lemma is that there must exist a set of k + 2 points that arc collectively close 
to every point in the dataset (we will formalize this soon). So, there is one 3-perturbation for which any k 
of these points can be an optimal set of centers. But it is not possible that all ( /, ' A . 2 ) of these sets of centers 
simultaneously create partitions that are e-close. This idea is first presented in a more general format. We 
include the proof in Appendix |D| 

Lemma 15. Given a set U of k elements and a set C (disjoint from U) of k + 2 elements, and each u G U 
ranks all elements in C without ties. It is not possible that for all sets C" C C, \C'\ = k, each c G C is 
ranked highest by exactly one u G U. 

Looking ahead, each element in U will correspond to a cluster, and each element in C will correspond 
to a point that becomes an optimal center under a d' we construct. 

Note 16. The lemma will still hold even if each U only ranks its top three elements in C, and all the rest are 
tied in fourth. This is because for each C, U only needs to express its top-ranked element. So there can be 
a C in which u G U’s first- and second-highest ranked elements are not in C, in which u needs to specify 
its third-highest ranked element, but it does not need to uniquely rank any other elements. 

We are almost ready to prove our main lemma. First, we state a fact that helps us apply the setting of 
Lemma [15] to a clustering instance. We would like to have a table such that each row i is an ordering of all 
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points based on their distance to C r . Then we know that in any optimal set of centers, the point that is the 
center for Ci is the highest point in the ranking. The following fact shows this ranking is well-defined. 

Fact 17. Given a clustering instance (S, d ) satisfying (a, e )-perturbation resilience such that all optimal 
clusters are size > 2 en. Given an a-perturbation with optimal score x. For all i, there exists a ranking of S 
such that for all sets \C\ = k with score x, the unique center in C which is closest to all but en points in Ci, 
is the point in S ranked highest. 

Proof Assume the lemma is false. Then there exists a cluster Q, two points p and q, and two sets of centers 
p,q E C and p.q G C which achieve score x, but p is the center for C r in C while q is the center for C, in 
C' . Then p is closer than all other points in C to all but en points in C t . Similarly, q is closer than all other 
points in C' to all but en points in Ci. Since \Ci\ > 2en, this causes a contradiction. □ 


Now using the previous two lemmas, we can prove Theorem [TTj 

Proof of Theorem [77] It suffices to prove that any two points from different clusters are at distance > r* 
from each other. Assume towards contradiction that this is not true. Then by Lemma [14| there exists a 
partition S x , S y of S such that for all p. q G S x , d(p, q) < 2 r* and similarly for S y . Furthermore, for all (p, 
if 3p G Ci n S x , then all points in S x are distance 3r* to c*, otherwise all points in S y are distance 3r* to c t . 

Construct a set C of size k + 2 as follows. Pick at least 3 points from S x and 3 points from S y . The rest 
of the points may be arbitrary. S x and S y must contain at least the clusters C x and C y , respectively, so here 
we assume optimal clusters are size > 3. Set C has the property that for all p G ,S', at least 3 points in C are 
distance < 3r* to p. We will use Lemma p3] to show a contradiction. 

First we construct a d! in which any size k subset of C is a valid set of centers: 


d'(p,q) = 


3 r* 


if and r* < d{p, q) < 3?’* 


3 d(p, q) otherwise. 


This is a 3-perturbation by construction. By Lemma|3j the optimal score under d! is 3r*. And given any 
set C C C, \C'\ = k, for all p G S, there must exist at least one point q G C' such that d{p,q) < 3 r*. 
Therefore C' is an optimal set of centers. 

Define the set U = {Ci,..., Cf }. From Factlnl for all i, there must be a unique c[ l) G C such that for 

(i) -' 

all other points p £ C, c\ is closer than p to the majority of points in Cj;. Similarly, there must be a unique 


„W 


point cf G C \ {c[°} that is closer to the majority of points in Cj, or else every C without would have 
a contradiction by Fact 12 Finally, when we pick the C' = C \ {c 1 / ' -('4 }' l h crc niust be a unique closer 

(2) (i) 

to the majority of points in Ci, for the same reason. Let all Ci € U define its ranking as cj , , c\ , and 




all the rest are tied in fourth. (Because of Note 16 it is okay that we have ties for fourth). Now we can use 
Lemma 15 on [A, C. Then there exists a C such that a c G C is ranked highest by at least two elements 
Ci, Cj G U . Then by definition of the rankings, c is the closest point in C" to the majority of points in Ci 
and Cj. But then (since each cluster size is > 2en) the optimal set of centers C' under d' is not e-close to 
OFT. □ 


Note that Theorem[l0| implies (2 — 6, e)- perturbation resilient A-centcr is hard for 6 > 0, even when the 
optimal clusters are large. Therefore, the value of a we achieve is within one of optimal. 


4.2 Lower bound on cluster sizes 

Before moving to the asymmetric case, we show that the lower bound on the cluster sizes in Theorem [IT] is 
necessary. Without this lower bound, clustering becomes hard, even assuming (a, e )-perturbation resilience 
for any a and e. This reduction follows from A-ccntcr (the details appear in Appendix |D|). 
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Theorem 18. For all a > 1 and e > 0, finding the optimal solution for k-center under (a, e) -perturbation 
resilience is NP-hard. 

4.3 (3, e) -perturbation resilience for asymmetric k- center 

In the asymmetric case, we consider the definition of the symmetric set A from Section [3j A = {p \ 
V( ],d(q,p ) < r* =>• dip, q) < r*}. We might first ask whether A respects the structure of OVT, as 
it did under 2-perturbation resilience. Namely, whether Condition 1: all centers are in A, and Condition 2: 
arg miriq Gj 4 d(q,p ) E C, ==> p E C, hold. This is not the case for either condition. We explore to what 
degree these conditions arc violated. 

We call a center “bad” if it is not in the set A, i.e., 3q ^ Ci and d(q, cf) < r*. When a bad center a 
exists, we can take it out of the set of optimal centers, and we can pick an arbitrary dummy center which 
must be close to Ci or a CCC for C,. In our symmetric argument, we arrived at a contradiction by showing 
that two dummy centers which capture the same cluster, must be close by the triangle inequality. This logic 
breaks down for asymmetric distances. In Appendix[Dj we show an example of an instance with a bad center 
that satisfies (a, e)-perturbation resilience. However, it turns out that no instance can have more than 6 bad 
centers under (3, e) -perturbation resilience, assuming all optimal clusters have size > 2 en. This is our main 
structural result for this section (Lemma |20|>. So Condition 1 is satisfied for all but a constant number of 
centers. However, Condition 2 may not be satisfied for up to en points. Therefore, even if we fully cluster 
A, we will only get e-close to OVT. 

Every point in A is at distance r* from its center and distance 2 r* from its entire cluster. However, as 
we mentioned it is possible that 6 centers are not in A (and possibly no points at all from those 6 clusters). 
This motivates the following algorithm. First, we run a symmetric Ac-center 2-approximation algorithm on 
A, for k — 6 < k' < k. For instance, iteratively pick an unmarked point, and mark all points distance 2 r* 
away from it ll2Tft . This gives us a 2-approximation for the centers in A, and thus a 3-approximation for S 
minus the clusters with no centers in A. Then we brute force search for the remaining < 6 centers to find a 
3-approximation for S. Under (3, e)-perturbation resilience, this 3-approximation must be e-close to OVT. 
Now we state the theorem, and the main structural lemma, which shows that at most 6 centers are bad. We 
give the full proof in Appendix |D| 

Algorithm 2 (3, e)-P erturbation Resilient Asymmetric Ac-center 
Input: Asymmetric Ac-center instance (S', d), r* (or tty all possible candidates). 

1. Build set A = {p \ \/q, d(q,p) < r* ==> d(jp, q)<r*}. 

2. Create the threshold graph G with vertices A, and threshold distance r*. Define a new symmetric Ac-center 
instance with A, using the lengths of the paths in the threshold graph. 

3. Run a symmetric Ac-center 2-approximation algorithm on the symmetrized instance. Start with k' = k — 6, 
and increase k' by 1 until the algorithm returns a solution with radius < 2 r*. 

4. Brute force over all size k — x subsets of C and all size x subsets of S for x < 6, to find a set of size k 
which is 3 r* from all points in S. Denote this set by C. 

Output: Output the Voronoi tiling G\, ..., Gk using C' as the centers. 


Theorem 19. Algorithm [2] runs in polytime and outputs a clustering that is e-close to OVT, for (3, e)- 
perturbation resilient asymmetric k-center instances s.t. all optimal clusters are size > 2 en. 

Lemma 20. Given a (3, e)-perturbation resilient asymmetric k-center instance such that all optimal clusters 
are size > 2 en, there are at most 6 centers Ci such that 3q (f Ci and d(q, a) < r*. 
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Proof idea. Assume the lemma is false. The first step is to construct a set C of < k — 3 points which are 
< 3 r* from every point in S. Once we find C, we will be able to find 3 dummy centers which contradict 
(3, e)-perturbation resilience. 

By assumption, there exists a set B , \B\ > 7, of centers a such that 3q G Cy, i' yA i, such that 
d(q, Ci ) < r*. Then d(cy, cy) < 2 r*, and d(cy, Ci) < 3 r*. So for all cy G 5, {q}| c =1 \ {cy} is still distance 
3r* from every point in S. 

To construct C, we carefully pick a subset B' C B of size > 3 such that no cy G B has a c' also in B'. 
Then we can set C = {c/}f =1 \ B' and construct a 3-perturbation in which these /.: — 3 centers achieve the 
optimal score. Now we find a contradiction by showing that not every combination of 3 dummy centers can 
simultaneously allow e-close clusterings. 

If for all c G C, c is the very best center in S for some cluster C{, then for any choice of dummy centers, 
c will be the center for C t and no other cluster, thus it will not affect our analysis; it is as if our instance is 
size k' = k — \C\. Then we can use Lemma 15 to arrive at a contradiction. 

When some set of points Chad C C are not the best center for a cluster, we cannot use the same lemma, 
since there are some centers we must include in every subset. However, we can use the Pigeonhole principle 
to show that at least one c G Chad is the best center for two clusters, out of all other points in Chad, and we 
use this to show a contradiction. 


5 Weak center proximity 

In this section, we consider any center-based objective, not just k-center. A clustering objective function is 
center-based if the solution can be defined by choosing a set of centers {ci, C 2 ,..., c^} C S, and partitioning 
S into k clusters OVT = {Cj , C 2 , ■ ■ ■ , Ck } by assigning each point to its closest center. Furthermore, 1) 
The objective value of a given clustering is a weighted sum or maximum of the individual cluster scores; 
2) given a proposed single cluster, its score can be computed in polynomial time, k’-median. A -means, and 
A:-ccntcr are all center-based objectives. 

Here, we show a novel algorithm that finds the optimal clustering in instances that satisfy two simple 
properties: each point is closer to its center than to any point in a different cluster, and we can recognize 
optimal clusters as soon as they are formed. Formally, we define these properties as 

1. weak center proximity: For all p G Cj and q G Cj, d(ci,p ) < dip. q). 

2. cluster verifiability: There exists a polytime computable function / : 2 s —> M that for B C ,5', if there 
is i G [k] such that B C Cj, then f(B) < 0, and if B D Ci, then f(B) > 0. 

Examples of cluster verifiable instances include any instance where all the optimal clusters are the same 
size ( f(B ) = \B\ — j'j, or where all the optimal clusters have the same A:-mcdian/A;-mcans cost ( f(B ) = 
- ${OVT)). 

For any center-based objective, weak center proximity is a consequence of 2-perturbation resilience (i.e., 
Lemma |22|>, so, our algorithm relies on a much weaker assumption than a-pcrturbation resilience for a > 2, 
when instances are cluster verifiable. 

All existing algorithms and analysis for a-pcrturbation resilience require that for all p G Cj; and q G Cj, 
d(ci,p ) < d(ci,q). It is not at all obvious how one can even proceed without such a property, as in its 
absence, clusters can ‘overlap’. That is, for a cluster with center a and radius r, we can not assume that 
B r (ci) only includes points from C, . Our challenge is then in showing that even in absence of this property, 
there is still enough structure imposed by the weak center proximity and cluster verifiability to find the 
optimal clustering efficiently. 

Our Algorithm [3] is a novel linkage based procedure. Given a clustering instance ( S,d ), we will start 
with a graph G = ( S , E) where E = 0. In each round, we do single linkage on the components in G, except 
we do not merge two components if both are supersets of optimal clusters (indicated by f(B) > 0). Put the 
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Figure 3: The edge between p and q cannot be the last edge added to A. 

single linkage edges from this round in a set A. This will continue until every component is a superset of 
an optimal cluster. Then we throw away the set A except for the very last edge that was added (we include 
a figure in Appendix [EJ Figure [3]). We will prove this last edge is never between two points from different 
clusters, so we add that single edge to E and then recur. Here, we present a proof sketch of our main theorem. 
The details can be found in Appendix |Ej 

Algorithm 3 CLUSTERING UNDER WEAK CENTER PROXIMITY AND CLUSTER VERIFIABILITY 
Input: Clustering instance (S, d ), function /, and k<\S\. 

Set G = (S, E ) and E = 0. While there are more than k components in G, repeat (1) and (2): 

1. Set A = 0. While there exists a component B in G' = (S, E U A) such that f(B) < 0, add (p, q ) to A, 
where d(p, q) is minimized such that p and q are in different components in G' and at least one of these 
components B has f(B) < 0. 

2. Take the last edge e that was added to A, and put e € E. 

Output: Output the components of G. 


Theorem 21. Given a center-based clustering instance satisfying weak center proximity and cluster verifi¬ 
ability, AIgorithm [?] outputs OVT in polynomial time. 

Proof Sketch. It suffices to show that step (b) never adds an edge between two points from different clusters. 
We proceed by induction. Assume it is true up to iteration t of the first while loop. Now assume towards 
contradiction that in round t, the last edge added to A is between two points p E C, and q E Cj, i f j (see 
Figure [3]). WLOG, for the component in G' that includes p, called P', we have f(P') < 0, otherwise the 
merge would not have happened. Furthermore, cq E P' by weak center proximity. Then f(P') < 0 implies 
that Ci \ P' is nonempty, so call it P. The component(s) in G corresponding to P are strict subsets of C), 
therefore, f(P) < 0. So they must merge to another component, and by weak center proximity, the closest 
component is P', but this contradicts our assumption that (p. q) was the last edge added to A. □ 

6 Conclusions 

Our work pushes the understanding of (promise) stability conditions farther in three ways. We are the first 
to design computationally efficient algorithms to find the optimal clustering under cc-perturbation resilience 
with a constant value of a for a problem that is hard to approximate to any constant factor in the worst 
case, thereby demonstrating the power of perturbation resilience. Furthermore, we demonstrate the limits of 
this power by showing the first tight results in this space for both perturbation resilience and approximation 
stability. Finally, we show a surprising relation between symmetric and asymmetric instances, in that they 
are equivalent under resilience to 2-perturbations, which is in stark constrast to their widely differing tight 
approximation factors. 
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A Proofs from Section |3| 


Lemma 22. Properties 1 and 2 hold for asymmetric k-center instances satisfying 2-perturbation resilience. 

Proof Property 1: Assume false, d(q,p ) < d(ci,p). The idea will be that since q is in A, it is close to its 
own center, so we can construct a perturbation in which q replaces its center c r Then p will join q’s cluster, 
causing a contradiction. Construct the following d 

, fmin(2r*, 2d(s, t)) if s = q, t E Cj U {p} 

d (s,t) = < 

I 2 d(s, t ) otherwise. 

This is a 2-perturbation because d(q,Cj U {//}) < 2 r*. Then by Lemma[3j the optimal score is 2 r*. 
The set of centers { q }^ =1 \ {Cj } U {q\ achieves the optimal score, since q is distance 2 r* from Cj, and all 
other clusters have the same center as in OVT (achieving radius 2 r*). Then for all q, d'(q,p ) < d 1 (///]>) < 
d'(ci,p). And since q E A, d(q,Cj) < r* so d(q,Cj) < 2 r*. Then we can construct a 2-perturbation in 
which q becomes the center of Cj, and then q is the best center for p, so we have a contradiction. 

Property 2: Assume on the contrary that there exists q E Cj, i f j such that d(q, c r ) < r*. Now we will 
define a d! in which q can become a center for Cj. 


, J min(2r*, 2 d(s, t )) if s = q, t E Cj 

\2d(s,t) otherwise. 

This is a 2-perturbation because d(q,Cj ) < 2 r*. Then by Lemma [5J the optimal score is 2 r*. The 
set of centers {q}^ =1 \ { Cj } U {p} achieves the optimal score, since p is distance 2 r* from Cj, and all 
other clusters have the same center as in OVT (achieving radius 2 r*). But the clustering with centers 
{(/ } ■'_ | \ {} U {[>} is different from OVT, since (at the very least) p and Cj are in different clusters. This 
contradicts 2-perturbation resilience. □ 

Lemma 23. A respects the structure of OVT. 

Proof. From Lemma |22| we can use Property 2 in our analysis. First we show that c, E A for all i G [*]■ 
Given c t , \/p € C t , then dfy.p) < r* by definition of OVT. Vg f C t , then by Property 2, d(q,Ci ) > r*. 
It follows that for any point p e S, it cannot be the case that d(p , cf) < r* and d(ci,p) > r*. Therefore, 
q E A. 

Now we show that for all p E S \ A, if A(p) E Cj, then p E Cj. Given p E S \ A, let p E Cj and assume 
towards contradiction that q = A(p) E Cj for some i f j. We will construct a 2-perturbation d! in which 
q replaces c y as the center for Cj and p switches from Cj to Cj , causing a contradiction. We construct d' as 
follows. All distances are increased by a factor of 2 except for d(q,p) and d(q. q') for all q' E Cj. These 
distances are increased by a factor of 2 up to 2 r*. Formally, 


. I min(2r*, 2 d(s, t )) if s = q, t E Cj U {p} 

d (s,t) = < 

I 2 d(s, t) otherwise. 

This is a 2-perturbation because d(q, Cj) < 2 r*. Then by Lemma|3} the optimal score is 2r*. The set of 
centers {q}JL 1 \ {Cj } U {g} achieves the optimal score, since q is distance 2 r* from Cj, and all other clusters 
have the same center as in OVT (achieving radius 2r*). But consider the point p. Since all centers are in 
A and q is the closest point to p in A, then q is the center for p under d'. Therefore, the optimal clustering 
under d' is different from OVT, so we have a contradiction. □ 
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B Proof of Theorem [TO] 


In this section, we prove Theorem[lO] The final reduction to /;:-centcr under (2 — e)-approximation stability 
and large clusters is from a problem we define, called Unambiguous-Balanced-Perfect Dominating Set. 

We use four NP-hard problems in a chain of reductions. Here, we define all of these problems up front. 
Perfect Dominating Set was introduced in iflOl . We introduce the “balanced” variants of two existing prob¬ 
lems for the first time. 

Definition (3-Dimensional Matching (3DM)). Given three disjoint sets X\, X 2 , and X 3 each of size m, and 
set T such that t E T is a triple t = (x \, X 2 , x 3 ), x\ E X\, X 2 E X 2 , and x 3 E X 3 . The problem is to find 
a set M C T of size m which exactly hits all the elements in X\ U X 2 U X 3 , in other words, for all pairs 
(xi,x 2 ,x 3 ), ( 2 / 1 , 2 / 2 , 2 / 3 ) G M, it is the case that x\ yi, x 2 7 ^ y 2 , and x 3 y 3 . 

Definition (Balanced-3-Dimensional Matching (B3DM)). This is the 3 DM problem (X\, X 2 , X 3 ,T) with 
the additional constraint that 2m < \T\ < 3m, where |Xi| = |X 2 | = |^" 3 1 = m. 

Definition (Perfect Dominating Set (PDS)). Given a graph G = (V. E) and an integer k, the problem is to 
find a set of vertices I) C V of size k such that for all v E V \ I), there exists exactly one d E D such that 
(v , d) E E. 

Definition (Balanced-Perfect-Dominating Set (BPDS)). This is the PDS problem with the additional as¬ 
sumption that if the graph has n vertices and a dominating set of size k exists, then each vertex in the 
dominating set hits at least ^ vertices. 

Additionally, each problem has an “Unambiguous” variant, which is the added constraint that the prob¬ 
lem has at most one solution. Valiant and Vazirani showed that Unambiguous-3SAT is hard unless NP = 
RP 6911 . To show the Unambiguous version of another problem is hard, one must establish a parsimonious 
reduction from Unambiguous-3SAT to that problem. A parsimonious reduction is one that conserves the 
number of solutions. For two problems A and B, we denote A < par B to mean there is a reduction from 
A to B that is parsimonious and polynomial. Note that many reductions which involve 1-to-l mappings are 
often trivial to verify parsimony. The problem is when one element in A maps to multiple elements in B. 
The reductions in this section are all 1-to-l mappings, and are therefore easy to verify parsimony. 

Now we start our argument. Dyer and Freize showed Planar-3DM is NP-hard fl8l . While planarity is 
not important for the purpose of our problems, their reduction from 3SAT has two other nice properties that 
we crucially depend on. First, the reduction is parsimonious, as pointed out in 621 . Second, given their 3DM 
instance X 1 , X 2 , X 3 , T, each element in V| LJ X 2 U X 3 appears in either two or three tuples in T. (Dyer and 
Freize mention this observation just before then - Theorem 2.3.) From this, it follows that 2m < |Tj < 3m, 
and so they actually showed a stronger result, that B3DM is NP-hard via a parsimonious reduction from 
3SAT. 

Next, Ben-David and Reyzin showed a reduction from 3DM to PDS IflOll . Their reduction maps every 
element X 1 , X 2 , X 3 . T in the 3DM instance to a vertex in the PDS instance (adding a single extra vertex), 
so it is easily parsimonious. 

We can use the same reduction to show B3DM< par BPDS. Their reduction maps every element X\, 
X 2 , X 3 , T to a vertex in V, and they add one extra vertex v to V. There is an edge from each element 
(xi,X 2 ,x 3 ) E T to the corresponding elements x 1 E X\, xx E X>, and x 3 E X 3 . Furthermore, there is 
an edge from v to every element in T. In ifTOll . it is shown that if the 3DM instance is a YES instance with 
matching M C T then the minimum dominating set is v IJ M. Then, this dominating set is size m + 1. If 
we start with B3DM, our graph has |Xi| + |X 2 + X 3 + \T\ + 1 < 6 m + 1 vertices since \T\ < 3m. Given 
t E M, t hits 3 nodes in the graph, and 2 (n^ +1 ) — Tm+5 — Furthermore, v hits \T\ — m > 2m — rn = rri 
nodes, and 5^(5 < m when m > 3. Therefore, the resulting PDS instance is indeed BPDS. 
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Now we have verified that there exists a chain of parsimonious reductions 3SAT < par B3DM < par 
BPDS, so it follows that Unambiguous-BPDS is hard unless NP = RP. 

At this point, we use the same reduction as in ifTOl to reduce from Unambiguous-BPDS to fc-center 
clustering under (2 — e)-approximation stability, where all clusters are size > T. The difference is that we 
must verify that the resulting instance is (2 — e)-approximation stable, which requires the Unambiguity. 

Theorem [lOj There is no polynomial time algorithm for finding the optimal k-center clustering under 
(2 — e) -approximation stability, even when assuming all optimal clusters are size > ff , unless NP = RP. 

Proof. Given e > 0. From the previous discussion, Unambiguous-BPDS is NP-hard unless NP = RP. 
Now we reduce to /.'-center clustering and show the resulting instance has all cluster sizes > T and satisfies 
(2 — e)-approximation stability. 

Given an instance of Unambiguous-BPDS, for every v E V, create a point v E S in the clustering 
instance. For every edge (it, v) E E, let d(u. v) = 1, otherwise let d(u, v) = 2. Since all distances are either 
1 or 2, the triangle inequality is trivially satisfied. Then a A'-center solution of cost 1 exists if and only if 
there exists a dominating set of size k. 

Since each vertex in the dominating set hit at least ^ vertices, the resulting clusters will be size at least 

iL 1 
2 k t i- 

Additionally, if there exists a dominating set of size k, then the corresponding optimal /. -center clustering 
has cost 1. Because this dominating set is perfect and unique, any other clustering has cost 2. It follows that 
the /.'-center instance is (2 — e)-approximation stable. 

o 


C (2, e) -Approximation Stability for Symmetric k- center 

In this section, we consider /c-center clustering under approximation stability (which was defined in Section 

33). 

Since (a, e)-approximation stability is strictly stronger than (a, e)-perturbation resilience, our results 
from Sections [3] and [4] immediately extend to approximation stability, namely, there are polynomial time 
algorithms for finding the optimal symmetric or asymmetric /c-center clustering under 2-approximation 
stability, symmetric /.’-center clustering under (3, e)-approximation stability, and an e-close algorithm for 
asymmetric A:-ccntcr under (3, e)-approximation stability, where the latter two results assume the optimal 
clusters are size > 2 en. In this section, we provide an algorithm that finds OPT for symmetric /c-center 
under (2, e)-approximation stability, assuming the optimal clusters are size > en. 

A key insight behind our result is that any p E Q is at distance < 2r* from all points in C, (by the 
triangle inequality). So, any two points in the same cluster have > en points in common that are within 
2r* of both. On the other hand, if a point p E C, were to also be at distance < 2 r* from more than en 
points in other clusters, then replacing p as the center of C, would lead to a partition that is at most 2 r* 
in cost but is not e-close to OPT. This contradicts (2, e)-approximation stability. Therefore, two points 
from different clusters can only have a small number of points in common that are within 2r* of both of 
them. Based on this insight. Algorithm [4] proceeds by placing two points p and q, in the same partition iff 
|-E> 2 r* (p) FI /i' 2 r* (<?) | > en. We show that this procedure indeed returns a partition that corresponds to OPT. 

Algorithm 4 (2, e) -APPROXIMATION STABLE /c-CENTER FOR LARGE CLUSTERS 
Input: Symmetric /c-center instance (S, d ), r* (or try all possible candidates). 

1. Define G = ( S,E ) such that E = {( p,q ) | \B 2 r *(p ) H B 2 r *(q)\ > en}. 

Output: Connected components of G. 


The following theorem formalizes our previous discussion. 
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Theorem 24. Given a (2, e) -approximation stable k-center instance such that for all i, \Ci\ > en, then 
A Igorithm [4] returns OVT in polynomial time. 

Proof. It suffices to show that for the optimal score r*, p and q are in the same cluster if and only if 

I B 2r * (p) H B 2r * (p) I > en. 

First we show the forward direction. Assume p and q are in the same cluster C r . For any pair of points 

si,s 2 E Ci, d(si,s 2 ) < d(si,Ci) + d(ci,s 2 ) < 2 r*. Therefore Q C B 2r *(p ) and Ci C B 2r *(q), so 
en < IQ| < \B 2r *(p) n B 2r *(q)\. 

For the reverse direction, assume on the contrary that there are p E C, and q E Cj, i f j such that 
\B 2r *(p) fl B 2r *(q)\ > en. Take any en + 1 points from \B 2r *(p) n B 2r *(q)\ and put them into a set M. 
Partition M into M(p) and M(q) such that M(q) = M n Ci and M(p) = M\M(q). Consider the partition 
Ci U M(p) \ M(q) and Cj U M{q) \ M(p). For all x E C t U M(ji) \ M(q), d(p, x) < 2 r*, and similarly, 
for all x E Cj U M(q) \ M(p ), d(q,x) < 2 r*. So, whenp and q serve as centers of C t U M(p) \ M(q) and 
Cj U M(q)\ M(p), respectively, the cost of clustering is at most 2 r*. But this partition differs from OVT 
by \M\ = en + 1 points, so it is not e-close to OVT. This contradicts (2, e)-approximation stability. □ 


D Proofs from Section [4| 

D.l Symmetric k -center 

Lemma |l4| Given a clustering instance satisfying (3, e)-perturbation resilience and all optimal clusters 
are size > 2 en. Assume there are two points from different clusters which are < r* apart from each other. 
Then there exists a partition S x U S y of S such that for all p. q E S x , d(p. q) < 2 r* and similarly for all 
p,q<E S y , d(p, q) < 2r* 

Proof. First we prove the le mm a assuming that a CCC2 exists, and then we prove the other case. When a 
CCC2 exists, we do not need the assumption that two points from different clusters arc close. 

Case 1: There exists a CCC2. If there exists a CCC, then denote c x as a CCC for C y . If there does not 
exist a CCC, then denote c x as a CCC2 for C y . We will show that all points are close to either C x or C y . 
c x is distance < r* to all but en points in C y . Therefore, d(c x , c y ) < 2 r* and so c x is distance < 3r* to all 
points in C y . Consider the following d!. 


d'(s, t) = 


min(3r*, 3d(s, t )) if s = c x , t E C y 
3 d{s, t ) otherwise. 


This is a 3-perturbation because d(c x , C y ) < 3 r*. Then by Lemma[3} the optimal score is 3r*. Given 
any p E S, the set of centers {q } -L, \ { c y } U {p} achieves the optimal score, since c x is distance 3 r* from 
C y , and all other clusters have the same center as in OVT (achieving radius 3 r*). Therefore, this set of 
centers must create a partition that is e-close to OVT, or else there would be a contradiction. Then from 
Fact 12 one of the centers in {q}f =1 \ { c y } U {p\ must be the center for the majority of points in C y under 
d'. If this center is q, l ^ x,y, then for the majority of points q E C y , d(cp q) < r* and d(q, q) < d(c z , q) 
for all z f l, y. Then by definition, q is a CCC for C y . But then l must equal x, so we have a contradiction. 
Note that if some q has for the majority of q E C y , dfcp q) < d(c z , q) (non-strict inequality) for all z f l, y, 
then there is another equally good partition in which q is not the center for the majority of points in C y , so 
we still obtain a contradiction. Therefore, either p or c x must be the center for the majority of points in C y 
under d!. 

If c x is the center for the majority of points in C y , then p must be the center for the majority of points in 
C x (it cannot be a different center q, since c x is a better center for C x than q by definition). Therefore, each 
p E S is distance < r* to all but en points in either C x or C y . 
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Now partition all the non-centers into two sets S x and S y , such that S x = {p \ for the majority of points q E 
C x , d(p, q) < r* } and S v = [p \ p <f S x and for the majority of points q E C y . d(p, q) < r *}. 

Then given p,q E S x , there exists an s E C x such that d(p, q) < d(p, s ) + d(-s, q) < 2 r* (since both 
points are close to more than half of points in C x ). Similarly, any two points p,q E S y are < 2 r* apart. See 
Figure [2b] This proves case 1. 

Case 2: There does not exist a CCC2. Now we use the assumption that there exist p E C x , q E C y , 
x T y, such that d(p, q) < r*. Then by the triangle inequality, p is distance < 3 r* to all points in C x and 
C y . Consider the following d!. 


dl(s, t ) = 


min(3r*, 3d(s, t)) if s = p, t E C x U C y 
3 d{s, t ) otherwise. 


This is a 3-perturbation because d(p, C x UC y ) < 3 r*. Then by Lemma[3] the optimal score is or*. Given 
any s E S, the set of centers {q}q =1 \ { c x , c y } U {p. s} achieves the optimal score, since p is distance 3r* 
from C x U C y , and all other clusters have the same center as in OVT (achieving radius 3r*). Therefore, this 
set of centers must create a partition that is e-close to OVT , or else there would be a contradiction. Then 
one of the centers in {q}f =1 \ { c x , c y } U \p, s} must be the center for the majority of points 
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from Fact 
in C x under d!. 

If this center is q, l x,y, then for the majority of points t E C x , d(cpt) < r* and d(cpt) < d(c z , t ) 
for all 2 T l, x , !)■ Then by definition, q is a CCC2 for C x , and we have a contradiction. 

Similar logic applies to the center for the majority of points in C y . Therefore, p and s must be the centers 
for C x and C y . Since s was an arbitrary noncenter, all noncenters are distance < r* to all but en points in 
either C x or C y . 

Now partition all the non-centers into two sets S x and S y , such that S x = {p \ for the majority of points q E 
C x , d(p, q) < r *} and S y = {p \ p ^ S x and for the majority of points q E C y , d(p, q) < r*}. 

Then given p,q E S x , there exists an s E C x such that d(p, q) < d(p, s ) + d(s, q) < 2 r* (since both 
points are close to more than half of points in C x ). Similarly, any two points p, q E S y are < 2 r* apart. This 
proves case 2. □ 


Lemma [15] Given a set U of k elements and a set C (disjoint from U) of k + 2 elements, and each u E U 
ranks all elements in C without ties. It is not possible that for all sets C' C C, \C'\ = k, each c E C is 
ranked highest by exactly one u E U. 


Proof. We will prove this by induction, starting at k = 3 . (In fact, the lemma can be proven directly, 
but it is notationally less taxing to prove the main part of the lemma for k = 3 .) Given u \, iq, it3 E U, 
Ci,c 2 ,c 3 ,c 4 ,c 5 E C, and preference lists for each u E U, such that for all C C 6', |C' , | = 3 , each c E C is 
ranked highest by exactly one u E U. Call this the unique ranking property. 

Without loss of generality, say that c\, c 2 , and c 3 are ranked highest by u\,u 2 , and u 3 , respectively. Now 
consider the triple {ci, c 2 , c x }, for x = 4 or x = o. Since c\ is ranked highest by a \ and c 2 is ranked highest 
by u 2 , u 3 must rank c x higher than ci and c 2 . Similar logic holds for the sets {ci, c 3 , c x }, and {c 2 , c 3 , c x }, 
and we conclude that ui, u 2 , and u 3 each rank c\ and C5 second-highest with respect to c\, c 2 , and c 3 . 

Now consider the set {01,04,05}. u\ ranks ci highest, and WLOG, let u 2 rank 04 higher than 05. It 
follows that u 3 must rank 05 higher than 04. 

Case 1 : 14 ranks 04 higher than 05. Then there is a contradiction in the set {03, 04, C5} because u\ and 
u 2 both rank 04 higher than C3 and 05. 

Case 2 : u\ ranks C5 higher than 04. Then there is a contradiction in the set { c 2 , 04, C5} because u\ and 
u 3 both rank C5 higher than c 2 and 04. 

Our base case is now proven. The inductive step follows easily. Assume the unique ranking property does 
not hold for every \U\ = i — 1 , |Cj = % + 1 . Assume there exists U = {u\,..., Ui}, C = {c\,..., Ci +2 }, 
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and M which satisfies the unique ranking property. As before, WLOG ci,..., c, are ranked highest by 
m,... ,Ui, respectively. Then let U' = U \ {ui}, C' = C\ {c ,}, and u E U has the same preference list as 
before, but with c t removed. In order for U, C to satisfy the unique ranking property, it must be true that U', 
C' satisfy the unique ranking property. Otherwise we would be able to find a C" C C' in which there exists 
a c E C" not ranked highest by any u E U'. Then in C" U { c ,} and U, Ui ranks c, highest, so c will still not 
be ranked highest by any u E U. This contradicts our inductive hypothesis. □ 

D.2 Proof of Theorem H8l 

Theorem [18] For all a > 1 and e > 0, finding the optimal solution for k-center under (a, e) -perturbation 
resilience is NP-hard. 

Proof. Given a value a: > 1 and e > 0. We show a reduction from the standard symmetric k’-center problem, 
which is NP-hard. Given a k -center instance ( S , d) with optimal partition (DVT , and let D denote the 
diameter of the dataset, i.e., D = max P)9£ g d(p, q). 

We construct an (a, e)-perturbation resilient k'-center instance (S', d') as follows. Add all the points 
from S to S', so S C S'. Add N = n/e additional points pi, ■ ■. ■ Pm- Now we define d'\ for all p. q E S, 
d'(p, q) = d(p, q ). For all pi and q E S', d'(pi, q) = a(D + 1). Finally, let k' = k + N. 

Now, in this clustering instance, note that all p t are distance a(D +1) from every other point. Therefore, 
to obtain a clustering with radius < a(D + 1), we must put each pi in its own cluster. Then we have the 
points in S left to cluster, with k centers. The optimal way to cluster S is OVT, and the maximum cluster 
radius in S is < D by construction. Then clearly for any r < D, there exists a solution for (S, d) fc-center 
with max radius < r if and only if there exists a solution for (S', d') k '-center with max radius < r. 

(S', d') is (a, e) perturbation resilient: given an a perturbation d" of d!. Note that if the original OVT 
of (S,d) has max radius r*, then we can achieve a max radius of ar* on (S',d") by keeping each p t in 
its own cluster. Any perturbation in which each pi is not in its own cluster must have max radius at least 
a(D + 1) > ar*. Call the optimal partition under ( S',d'), C and the optimal partition under (S',d"), C. 
Note \S'\ = N + n = n/e + n. Then = _n_ = ^ < e. 

By the above argument, C and C must be at least e-close. Therefore, (S',d') is (a, e)-perturbation 
resilient. @ 


D.3 Asymmetric /’-center 


See Figure 4a for an example of a (3, e)- perturbation resilient instance with one bad center. 
Lemma 25. The clustering instance in Figure |4a | satisfies (a, '/is) -perturbation resilience. 


Proof, n = 18, so we will argue that under any o-pcrturbation, at most one point switches clusters. Clearly 
the optimal centers are c x , c y , and c z , and r* = 1. Under any cc-perturbation, the set of optimal centers must 
contain c x and c z , since no other points are close to x\,..., xr, and z±,... , z$. For the final center, we cannot 
pick any y % , since y % is not close to y :] for j f i . Flowever, we can pick any x, or Z{, and no matter which 
point we pick, it will be closer to y \,..., y§ than c x and c z (even when distances are scaled by a factor of 
a). Then at most one point (the new center) switches clusters, so the instance is still t/ts-close to the optimal 
clustering. □ 


Lemma[20j Given a (3, e)-perturbation resilient asymmetric k-center instance such that all optimal clusters 
are size > 2 en, then there are at most 6 centers Ci such that 3q f Ci and d(q,Ci) < r*. 
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(a) A (3, e) -perturbation resilient asymmetric k- (b) An illustration of Case 2. The blue balls are the optimal 
center instance with one bad center (c y ). The orange clusters. The purple balls are the set 1)'. 
arrows are distance 1, and the black arrows are dis¬ 
tance -. 

Oi 


Figure 4: Illustrations of an instance with bad centers 


and the proof of Lemma [20| 


Proof. Assume the lemma is false. The first step is to construct a set C of < k — 3 points which are < 3r* 
from every point in S. Once we find C, we will show how to find 3 dummy centers which contradict (3, e)- 
perturbation resilience. 

By assumption, there exists a set B, \B\ > 7, of centers c,; such that 3q ^ C. t and d(q,Ci) < r*. 
Define a(i) as the center of q’s cluster (for all q G B). Then d(a(i),q ) < r*, and so d(a(i),Ci ) < 
d(a(i), q) + d(q , a) + d(c,, Q) < 3 r*. So, one might think we can remove B from the set of optimal centers, 
and the remaining centers are still 3 r* from all points in S. However, what if 3c, G B such that a(i) is also 
in B1 Then we cannot take out the entire set B. Our goal is to find B' C B such that Vc* G B' , a(i) ^ B' 
and \B'\ > 3 (this will allow us to set C = {q}* =1 \ B'). 

Construct a graph G = ( V. E) whose vertices are c, G B. For each c, G B, if a(i) G B, then add a 
directed edge (q, «(?)). Then every point has out-degree < 1. Finding If corresponds to finding > 3 points 
with no edges to one another. Consider a connected component G' = (V. E') of G. Then | E' \>\v'\-i. 
Since every vertex has out-degree < 1, \E'\ < V \. Then we have two cases. 


jn 
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Case 1: \E'\ = \V’\ — 1. Then G’ is a tree, and and we can find a set of 

one another. Case 2: \E’\ = \V’\. Then G’ contains a cycle, and we can find a set of 
edges to one another. 


vertices with no edges to 
vertices with no 


rn 
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It follows that we can always find 


3 


vertices with no edges to one another (equality when G consists 
only of disjoint 3-cycles). For \B\ > 7, there exists such a set If of size > 3. Then we have the property 
that a G B' =>• a(i) $. B'. 

Now let C = {ci}f =1 \ B’. By construction, B’ is distance < 3 r* to all points in S. Consider the 
following d': increase all distances by a factor of 3, except d(a(i),p), for i such that a G If and p G Q, 
which we increase to rnin(3r*. 3 d(a(i),p)). Then by Lemma[3} the optimal radius is 3r*. Therefore, the set 
C achieves the optimal score even though |Cj < k — 3. Then we can pick any combination of 3 dummy 
centers, and they must all result in clusterings e-close to OVT. We will show this is not possible, and there 
must be a contradiction. 

Recall Fact 17 from the previous section. For each cluster C*, there exists a unique point p in S ranked 
first for C*, which means that in an optimal set of centers containing p, then p will always be the center for C,. 
Call this point c(i). Now we define the following sets. Partition C into C goo d = {c G C \ 3i s.t. c = c(i)} 
and C bad = C \ C good . Then |Cj = \C good \ + \C bad \. Furthermore, let D good = {c £ C \ 3i s.t. c = 
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c(?‘)}, the rest of the ‘good’ points. C goo d + 0, /n0( i = k, since there is one good point per cluster. Then 
( k — \Dg OOC i\) + (Cfeadl < k — 3 =>■ + 3 < \Dg 00 d\. 

In the last section, we found a set of size k + 2, and any subset of size k were valid centers. But now, we 
have a set of C fixed points, which must always be in the set of centers. However, we make the following 
observation. The points in C goo d will always be the center for the same clusters. Thus, they will never affect 
our analysis of what clusters the dummy centers choose; they are irrelevant to our analysis. In fact, if all 
points in C are good, then we can reduce to the setting in the previous section. 

Case 1: C goo d = C, i.e., all centers in C are irrelevant to the analysis of the dummy centers. Let 
k' = k — \C\. Then pick a set E of k! + 2 arbitrary points from S \C. Given any E' C E of size k', then 
CUE' must create a clustering that is e- close to OVT. Since C always grab the same clusters, we can use 
Lemma [15] to get a contradiction. 

Case 2: \C goo d\ < \C\. We need C ba d to have at least 2 points to find a contradiction. If \C ba d\ = L 
then add an arbitrary point p E S \ C \ D goo( i to Chad- Then 6' < k — 2, and \C ba d\ > 2, and we still 
have that | C bax i | + 2 < I) good | ■ For all clusters C, such that a(i) E D gooc i, there exists ac£ Chad which is 
ranked highest by Q among all other points in Chad- Since \Cb a d\ < \D goo d\, by the Pigeonhole Principle, 
there must exist some p E Chad which is ranked highest in Chad to two different clusters C ? , for i such that 
c(i) E D goo d . Call these clusters C\ and Cfi . Then we obtain a contradiction as follows. Let the set of centers 
be C U P', where D' C Dg 00 d such that \D'\ = k — |C|, and D' does not contain c(l) or c(2). See Figure 

m 

This is possible because \D'\ = k - \C\ = k - \C good \ - \C ba d\ = k - (k - \B goo d \) - \C ba d\ = 
\B g ood\ ~ | C ba d\ < | Bgood\ ~ 2. Every point s E C goo d U D l is closest to one cluster, so if s were closest to 
two clusters, the solution would not be e-close. And by construction, p is the closest element in C ba d to two 
different clusters, neither of which are hit by C goo d U D'. Therefore, the clustering defined by C IJ I)’ is not 
e-close to OVT , and we have a contradiction. □ 

Theorem [19] Algorithm [2] runs in polynomial time and outputs a clustering that is e-close to OVT, for 
(3, e) -perturbation resilient asymmetric k-center instances such that all optimal clusters are size > 2ere 

Proof. From Lemma [20] we know that all but x < 6 centers arc in the symmetric set A. 

It is possible that the symmetric 2-approximation does not return a solution of size < 2r* for k! = k — x, 
if there are points p E Ci such that p E A but c, f A (but at the very least, the algorithm will find a solution 
of radius < 2r* for k' = k, since A has a solution of radius < r* ). If such points p are returned as centers by 
the 2-approximation algorithm, it will be problematic to our analysis. However, the next step in the algorithm 
removes these points by brute force. Here is why the brute force step must be successful: there are k — x 
points in C which are distance 2 r* from the k — x centers in A, thus 3r* to the corresponding clusters. There 
are also x points in S which are distance r* to the final x clusters, namely their centers (and by definition, 
these x centers were not in C C A). 

Finally, we explain why \C\ must be e-close to OVT. Create a 3-perturbation in which we increase all 
distances by 3, except for the distances from C to all points in their Voronoi tile, which we increase up to 
3 r*. Then, the optimal score is 3r* by Lenmia[3] and C achieves this score. Therefore, by (3, e)-perturbation 
resilience, the Voronoi tiling of C must be e-close to OVT. This completes the proof. □ 

E Proof of Theorem 1211 

Theorem |2l] Given a center-based clustering instance satisfying weak center proximity and cluster verifi¬ 
ability, A Igorithm [3] outputs OVT in polynomial time. 

Proof. To prove that the algorithm returns OVT , it suffices to show that every step (b) adds an edge between 
two points from the same cluster. 
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We proceed by induction. Assume that on iteration t of the while loop in step 1, G contains no edges 
between two points from different clusters. Call this graph 6'/. Now assume towards contradiction that in 
this round, the last edge added to A is e = (p, q ), where p G C t and q E Cj, i ^ j. Denote by G' t the graph 
G' just before e is added to A. Let P' and Q' be the components of p and q in G' t , respectively. (P' and O' 
do not need to be subsets of C{ and Cj). WLOG, f {P' ) < 0, or else the merge would not have happened. 
Denote P as the connected component in Gt that contains p. Then P C Cj by our inductive hypothesis. 
Furthermore, c* G P', since dicp. p) < d(p , q) by weak center proximity, so either c, was already in P, or 
was added to P' before we added edge e. Then by definition of cluster verifiability, f(P') < 0 implies that 
C l \ P' is nonempty, so call it P". Call the component(s) in Gt corresponding to P" by B\,.... B x . By our 
inductive hypothesis, for 1 < y < x, B y C C t . By definition of cluster verifiability, /( P") < 0, so these 
components must merge to a component outside of P" But by weak center proximity, each point in P" is 
closer to c, than to any point from another cluster. Therefore, the algorithm must add an edge between P" 
and P after e is added to A, which contradicts our assumption that e was the last edge added to A. 

Finally, the runtime of the algorithm is polynomial since each step involves searching through polyno¬ 
mial^ many edges. □ 
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